Goto

Collaborating Authors

 precision matrix estimation


Machine Learning-Assisted High-Dimensional Matrix Estimation

arXiv.org Machine Learning

Efficient estimation of high-dimensional matrices--including covariance and precision matrices--is a cornerstone of modern multivariate statistics. Most existing studies have focused primarily on the theoretical properties of the estimators (e.g., consistency and sparsity), while largely overlooking the computational challenges inherent in high-dimensional settings. Theoretically, we first prove the convergence of LADMM, and then establish the convergence, convergence rate, and monotonicity of its reparameterized counterpart; importantly, we show that the reparameterized LADMM enjoys a faster convergence rate. Notably, the proposed reparameterization theory and methodology are applicable to the estimation of both high-dimensional covariance and precision matrices. Keywords: ADMM; High-dimensional; Learning-based optimization; Matrix estimation. 1. Introduction High-dimensional matrix estimation--covering both covariance and precision matrix estimation--constitutes a cornerstone of modern statistics and data science [1, 2, 3]. Accurate covariance estimation enables the characterization of dependence structures among a large number of variables [4, 5, 6], which is indispensable in diverse domains such as genomics [7, 8], neuroscience [9], finance [10, 11, 12], and climate science [13, 14]. Over the past two decades, substantial progress has been made in the statistical theory of high-dimensional matrix estimation, particularly with respect to the accuracy of estimators, including properties such as sparsistency and consistency [5, 15, 16]. However, in empirical studies, the dimensionality is often only on the order of tens to hundreds, and in many cases is comparable to the sample size [21, 22, 23, 24]. This observation highlights a notable gap between the statistical theory of estimators and the practical challenges of their computational implementation.




Non-Asymptotic Analysis of Data Augmentation for Precision Matrix Estimation

arXiv.org Machine Learning

This paper addresses the problem of inverse covariance (also known as precision matrix) estimation in high-dimensional settings. Specifically, we focus on two classes of estimators: linear shrinkage estimators with a target proportional to the identity matrix, and estimators derived from data augmentation (DA). Here, DA refers to the common practice of enriching a dataset with artificial samples--typically generated via a generative model or through random transformations of the original data--prior to model fitting. For both classes of estimators, we derive estimators and provide concentration bounds for their quadratic error. This allows for both method comparison and hyperparameter tuning, such as selecting the optimal proportion of artificial samples. On the technical side, our analysis relies on tools from random matrix theory. We introduce a novel deterministic equivalent for generalized resolvent matrices, accommodating dependent samples with specific structure. We support our theoretical results with numerical experiments.


A General Class of Model-Free Dense Precision Matrix Estimators

arXiv.org Machine Learning

We introduce prototype consistent model-free, dense precision matrix estimators that have broad application in economics. Using quadratic form concentration inequalities and novel algebraic characterizations of confounding dimension reductions, we are able to: (i) obtain non-asymptotic bounds for precision matrix estimation errors and also (ii) consistency in high dimensions; (iii) uncover the existence of an intrinsic signal-to-noise -- underlying dimensions tradeoff; and (iv) avoid exact population sparsity assumptions. In addition to its desirable theoretical properties, a thorough empirical study of the S&P 500 index shows that a tuning parameter-free special case of our general estimator exhibits a doubly ascending Sharpe Ratio pattern, thereby establishing a link with the famous double descent phenomenon dominantly present in recent statistical and machine learning literature.


Trans-Glasso: A Transfer Learning Approach to Precision Matrix Estimation

arXiv.org Machine Learning

Precision matrix estimation is essential in various fields, yet it is challenging when samples for the target study are limited. Transfer learning can enhance estimation accuracy by leveraging data from related source studies. We propose Trans-Glasso, a two-step transfer learning method for precision matrix estimation. First, we obtain initial estimators using a multi-task learning objective that captures shared and unique features across studies. Then, we refine these estimators through differential network estimation to adjust for structural differences between the target and source precision matrices. Under the assumption that most entries of the target precision matrix are shared with source matrices, we derive non-asymptotic error bounds and show that Trans-Glasso achieves minimax optimality under certain conditions. Extensive simulations demonstrate Trans Glasso's superior performance compared to baseline methods, particularly in small-sample settings. We further validate Trans-Glasso in applications to gene networks across brain tissues and protein networks for various cancer subtypes, showcasing its effectiveness in biological contexts. Additionally, we derive the minimax optimal rate for differential network estimation, representing the first such guarantee in this area.


Distributed Networked Multi-task Learning

arXiv.org Artificial Intelligence

--We consider a distributed multi-task learning scheme that accounts for multiple linear model estimation tasks with heterogeneous and/or correlated data streams. We assume that nodes can be partitioned into groups corresponding to different learning tasks and communicate according to a directed network topology. Each node estimates a linear model asynchronously and is subject to local (within-group) regularization and global (across groups) regularization terms targeting noise reduction and generalization performance improvement respectively. We provide a finite-time characterization of convergence of the estimators and task relation and illustrate the scheme's general applicability in two examples: random field temperature estimation and modeling student performance from different academic districts. Index T erms --Multi-task Learning, Distributed Optimization, Network-based computing systems, Multi-agent systems. N the current age of big data, many applications often face the challenge of processing large and complex datasets, which are usually not available in a single place but rather distributed across multiple locations. Approaches that require data to be aggregated in a central location may be subject to significant scalability and storage challenges. In other scenarios, data are scattered across different sites and owned by different individuals or organizations. Data privacy and security requirements make it difficult to merge such data in an easy way. In both contexts, Distributed Learning (DL) [1]-[3] can provide feasible solutions by building high-performance models shared among multiple nodes while maintaining user privacy and data confidentiality. DL aims to build a collective machine learning model based on the data from multiple computing nodes that can process and store data and are connected via networks. Nodes can utilize neighboring information to improve their own performance: rather than sharing raw data, they only exchange model information such as model parameters or gradients to avoid revealing sensitive information. This work was supported in part by the National Science Foundation under A ward ECCS-1933878 and in part by the Air Force Office of Scientific Research under Grant 15RT0767. Lingzhou Hong and Alfredo Garcia are with the Department of Industrial & Systems Engineering, Texas A&M University, College Station, TX 77843 USA (e-mail: { hlz, alfredo.garcia


Concentration of a sparse Bayesian model with Horseshoe prior in estimating high-dimensional precision matrix

arXiv.org Machine Learning

Precision matrices are crucial in many fields such as social networks, neuroscience, and economics, representing the edge structure of Gaussian graphical models (GGMs), where a zero in an off-diagonal position of the precision matrix indicates conditional independence between nodes. In high-dimensional settings where the dimension of the precision matrix $p$ exceeds the sample size $n$ and the matrix is sparse, methods like graphical Lasso, graphical SCAD, and CLIME are popular for estimating GGMs. While frequentist methods are well-studied, Bayesian approaches for (unstructured) sparse precision matrices are less explored. The graphical horseshoe estimate by \citet{li2019graphical}, applying the global-local horseshoe prior, shows superior empirical performance, but theoretical work for sparse precision matrix estimations using shrinkage priors is limited. This paper addresses these gaps by providing concentration results for the tempered posterior with the fully specified horseshoe prior in high-dimensional settings. Moreover, we also provide novel theoretical results for model misspecification, offering a general oracle inequality for the posterior.


Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

arXiv.org Machine Learning

Estimating matrices in the symmetric positive-definite (SPD) cone is of interest for many applications ranging from computer vision to graph learning. While there exist various convex optimization-based estimators, they remain limited in expressivity due to their model-based approach. The success of deep learning has thus led many to use neural networks to learn to estimate SPD matrices in a data-driven fashion. For learning structured outputs, one promising strategy involves architectures designed by unrolling iterative algorithms, which potentially benefit from inductive bias properties. However, designing correct unrolled architectures for SPD learning is difficult: they either do not guarantee that their output has all the desired properties, rely on heavy computations, or are overly restrained to specific matrices which hinders their expressivity. In this paper, we propose a novel and generic learning module with guaranteed SPD outputs called SpodNet, that also enables learning a larger class of functions than existing approaches. Notably, it solves the challenging task of learning jointly SPD and sparse matrices. Our experiments demonstrate the versatility of SpodNet layers.


Large Scale Distributed Sparse Precision Estimation

Neural Information Processing Systems

We consider the problem of sparse precision matrix estimation in high dimensions using the CLIME estimator, which has several desirable theoretical properties. We present an inexact alternating direction method of multiplier (ADMM) algorithm for CLIME, and establish rates of convergence for both the objective and optimality conditions. Further, we develop a large scale distributed framework for the computations, which scales to millions of dimensions and trillions of parameters, using hundreds of cores. The proposed framework solves CLIME in columnblocks and only involves elementwise operations and parallel matrix multiplications. We evaluate our algorithm on both shared-memory and distributed-memory architectures, which can use block cyclic distribution of data and parameters to achieve load balance and improve the efficiency in the use of memory hierarchies. Experimental results show that our algorithm is substantially more scalable than state-of-the-art methods and scales almost linearly with the number of cores.